ARROW-10080: [R] Call gc() and try again in MemoryPool#8533
Conversation
|
I've replace explicit calls to |
|
Not in scope for this PR, may result in further spurious OOMs later:
|
romainfrancois
left a comment
There was a problem hiding this comment.
There's a lot I don't understand on the C++ side, but that otherwise looks good to me.
|
|
||
| // ARROW-10080: Allocation may fail spuriously since the garbage collector is lazy. | ||
| // Force it to run then try again in case any reusable allocations have been freed. | ||
| static cpp11::function gc = cpp11::package("base")["gc"]; |
There was a problem hiding this comment.
Maybe make gc() a member of GcMemoryPool instead of repeating it for each GcAndTryAgain ?
There was a problem hiding this comment.
The instance is static so it will only be initialized in the first call to GcAndTryAgain().
I had gc as a member of GcMemoryPool initially but it resulted in an instance of gc which didn't show up in trace(). I assume this is due to a subtlety with the timing of the constructor of g_pool but I didn't debug further.
There was a problem hiding this comment.
In the first call of each version of the template method GcAndTryAgain right ? Probably no big deal anyway.
|
I also had, in a branch that builds on top of #8256 ways to prematurely invalidate objects when we know they won't be used anymore. For example, in this function: collect.arrow_dplyr_query <- function(x, as_data_frame = TRUE, ...) {
x <- ensure_group_vars(x)
# Pull only the selected rows and cols into R
if (query_on_dataset(x)) {
# See dataset.R for Dataset and Scanner(Builder) classes
tab <- Scanner$create(x)$ToTable()
} else {
# This is a Table/RecordBatch. See record-batch.R for the [ method
tab <- x$.data[x$filtered_rows, x$selected_columns, keep_na = FALSE]
}
if (as_data_frame) {
df <- as.data.frame(tab)
tab$invalidate() # HERE <<<<<<-------------
restore_dplyr_features(df, x)
} else {
restore_dplyr_features(tab, x)
}
}inside the Is this still worth having ? And in that case should I push this to #8256 cc @nealrichardson |
|
I think it's worthwhile to invalidate individual objects as soon as we know we won't need them since it removes unnecessary burden from the garbage collector; it'll essentially be a performance hint (and therefore removal is also justifiable if you want to keep things simple-as-possible). |
|
Agree. So I cherry picked that commit into the #8256 pull request so that we have |
Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com>
|
CI failures are unrelated. Merging |
@bkietz I've bisected the commit history, and it does look like this change diff --git a/cpp/src/arrow/io/memory.cc b/cpp/src/arrow/io/memory.cc
index 1cde5e64e..88f006fe0 100644
--- a/cpp/src/arrow/io/memory.cc
+++ b/cpp/src/arrow/io/memory.cc
@@ -54,7 +54,7 @@ BufferOutputStream::BufferOutputStream(const std::shared_ptr<ResizableBuffer>& b
Result<std::shared_ptr<BufferOutputStream>> BufferOutputStream::Create(
int64_t initial_capacity, MemoryPool* pool) {
// ctor is private, so cannot use make_shared
+ DCHECK_NE(pool, nullptr);
auto ptr = std::shared_ptr<BufferOutputStream>(new BufferOutputStream);
RETURN_NOT_OK(ptr->Reset(initial_capacity, pool));
return ptr;
is "responsible" for breaking the JNI status check (you can also see it was passing on the first commit in this PR https://github.com/apache/arrow/runs/1310578738 and then failing at the end https://github.com/apache/arrow/runs/1329149428). I am not familiar with the code, and I was hoping for you to possibly help me understand why the check was added and why it might be segfaulting the test java/adapter/orc/src/test/java/org/apache/arrow/adapter/orc/OrcReaderTest.java. I have experimented with removing the check and the test passes as expected, but if the check was added to catch something I'd rather not cover it up. I noticed there were two follow ups ARROW-10420 and ARROW-10421 and I was wondering if this might be related? Edit: since the runs above were not merged and their branch was deleted it looks like GitHub dropped the logs. This is the specific test error I am referring to (grabbed from a run on the |
|
Here's some more context which could be helpful, the first coming from the run logs and the latter being a file that would be generated after the crash. Test logs: Test crash logs: |
|
@terencehonles Sorry for the breakage! I added this check when debugging injection of the customized memory pool. I didn't see that |
No prob, I'll try to see if I can figure out what the Java code is doing, but thanks for the input! |
|
It looks like the offending line is https://github.com/apache/arrow/blob/master/cpp/src/jni/orc/jni_wrapper.cpp#L229 and I'm updating that to not pass null, but since passing a null pointer was previously supported I'm going to also remove the check in case it's used elsewhere. |
`OrcStripeReaderJniWrapper::getSchema` previously used `nullptr` for the memory pool parameter to `arrow::ipc::SerializeSchema` to implicitly call `arrow::default_memory_pool()`. As part of ARROW-10080 (#8533), a check was placed to fail on `nullptr` being provided. This change removes the check, but also explicitly calls `arrow::default_memory_pool()` which is used elsewhere in the JNI wrapper. Closes #8595 from terencehonles/arrow-10499 Authored-by: Terence D. Honles <terence@honles.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Fixes the cases mentioned in #8533 (comment). Closes #9939 from lidavidm/arrow-10421 Authored-by: David Li <li.davidm96@gmail.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
Allocation may fail spuriously since the garbage collector is lazy. A custom MemoryPool can force it to run when allocation may have failed spuriously then try again in case any reusable allocations have been freed.